AITopics | distribution correction

Collaborating Authors

distribution correction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Ofir Nachum, Yinlam Chow, Bo Dai, Lihong Li

Neural Information Processing SystemsFeb-14-2026, 05:25:06 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, behavior policy, distribution correction, (14 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Review for NeurIPS paper: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Neural Information Processing SystemsFeb-6-2025, 20:44:52 GMT

The paper is very theoretically-grounded, with plenty of explanation of intuition and proof of the approximations used. The significance of the contribution is large. Most RL algorithms are exactly the ADP family that this proposes to modify, and the addition of this corrective feedback model can be slotted into most training loops without compatibility issues. As the authors note, it could also be used to guide exploration rather than just for post hoc transition correction. This is clearly relevant to the NeurIPS community, much of which makes use of this form of RL algorithm.

corrective feedback, distribution correction, reinforcement learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Review for NeurIPS paper: DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Neural Information Processing SystemsFeb-6-2025, 20:44:45 GMT

The reviewers appreciated the rebuttal that provided some additional insights.

corrective feedback, distribution correction, reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

DisCor: Corrective Feedback in Reinforcement Learning via Distribution Correction

Neural Information Processing SystemsOct-11-2024, 12:17:06 GMT

Deep reinforcement learning can learn effective policies for a wide range of tasks, but is notoriously difficult to use due to instability and sensitivity to hyperparameters. The reasons for this remain unclear. In this paper, we study how RL methods based on bootstrapping-based Q-learning can suffer from a pathological interaction between function approximation and the data distribution used to train the Q-function: with standard supervised learning, online data collection should induce corrective feedback, where new data corrects mistakes in old predictions. With dynamic programming methods like Q-learning, such feedback may be absent. This can lead to potential instability, sub-optimal convergence, and poor results when learning from noisy, sparse or delayed rewards.

corrective feedback, distribution correction, reinforcement learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections

Nachum, Ofir, Chow, Yinlam, Dai, Bo, Li, Lihong

arXiv.org Artificial IntelligenceJun-10-2019

In many real-world reinforcement learning applications, access to the environment is limited to a fixed dataset, instead of direct (online) interaction with the environment. When using this data for either evaluation or training of a new policy, accurate estimates of discounted stationary distribution ratios -- correction terms which quantify the likelihood that the new policy will experience a certain state-action pair normalized by the probability with which the state-action pair appears in the dataset -- can improve accuracy and performance. In this work, we propose an algorithm, DualDICE, for estimating these quantities. In contrast to previous approaches, our algorithm is agnostic to knowledge of the behavior policy (or policies) used to generate the dataset. Furthermore, it eschews any direct use of importance weights, thus avoiding potential optimization instabilities endemic of previous methods. In addition to providing theoretical guarantees, we present an empirical study of our algorithm applied to off-policy policy evaluation and find that our algorithm significantly improves accuracy compared to existing techniques.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1906.04733

Country: North America (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

Skeptical Deep Learning with Distribution Correction

An, Mingxiao, Chen, Yongzhou, Liu, Qi, Liu, Chuanren, Lv, Guangyi, Wu, Fangzhao, Ma, Jianhui

arXiv.org Machine LearningNov-9-2018

Recently deep neural networks have been successfully used for various classification tasks, especially for problems with massive perfectly labeled training data. However, it is often costly to have large-scale credible labels in real-world applications. One solution is to make supervised learning robust with imperfectly labeled input. In this paper, we develop a distribution correction approach that allows deep neural networks to avoid overfitting imperfect training data. Specifically, we treat the noisy input as samples from an incorrect distribution, which will be automatically corrected during our training process. We test our approach on several classification datasets with elaborately generated noisy labels. The results show significantly higher prediction and recovery accuracy with our approach compared to alternative methods.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

1811.03821

Country: Asia (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback